# GPTQ Quantization

Qwen3 Reranker 4B W4A16 G128
Apache-2.0
This is the result of GPTQ quantization on Qwen/Qwen3-Reranker-4B, significantly reducing VRAM usage.
Large Language Model Transformers
Q
boboliu
157
1
Qwen3 Embedding 0.6B W4A16 G128
Apache-2.0
GPTQ quantized version of Qwen3-Embedding-0.6B, optimized for video memory usage with minimal performance loss
Text Embedding
Q
boboliu
131
2
Qwen3 0.6B GPTQ Int8
Apache-2.0
Qwen3-0.6B is the latest 0.6B-parameter large language model in the Qwen series, supporting switching between thinking and non-thinking modes, with exceptional reasoning, instruction-following, and agent capabilities.
Large Language Model Transformers
Q
Qwen
1,231
3
Qwen3 1.7B GPTQ Int8
Apache-2.0
Qwen3 is the latest version in the Tongyi Qianwen series of large language models, offering a 1.7B-parameter GPTQ 8-bit quantized model that supports switching between reasoning and non-reasoning modes, enhancing inference capabilities and multilingual support.
Large Language Model Transformers
Q
Qwen
635
1
Orpheus 3b 0.1 Ft.w8a8
Apache-2.0
Orpheus-3B-0.1-FT is a text-to-speech model based on a causal language model, supporting efficient quantization compression.
Large Language Model Transformers English
O
nytopop
173
0
Qwen2.5 VL 3B Instruct GPTQ Int4
Apache-2.0
This is the GPTQ-Int4 quantized version of the Qwen2.5-VL-3B-Instruct model, suitable for multimodal tasks involving image-to-text and text-to-text, supporting both Chinese and English.
Image-to-Text Transformers Supports Multiple Languages
Q
hfl
1,312
2
Meta Llama 3.1 8B Instruct GPTQ INT4
This is the INT4 quantized version of the Meta-Llama-3.1-8B-Instruct model, quantized using the GPTQ algorithm, suitable for multilingual dialogue scenarios.
Large Language Model Transformers Supports Multiple Languages
M
hugging-quants
128.18k
25
Five Phases Mindset
Gpl-3.0
A QWEN-based AI TCM consultation model integrating Five Elements theory to provide personalized TCM diagnostic services
Large Language Model Transformers Chinese
F
cookey39
14
2
Llama 2 13B GPTQ
GPTQ quantized version of Meta's Llama 2 13B model, suitable for efficient inference
Large Language Model Transformers English
L
TheBloke
538
121
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase